Description: The data consist of 200 subjects from a larger study on the survival of patients following admission to an adult intensive care unit (ICU). The study used logistic regression to predict the probability of survival for these patients until their discharge from the hospital. The dependent variable is the binary variable Vital Status (STA). Nineteen possible predictor variables, both discrete and continuous, were also observed. Number of cases: 200 Variable Names:
> ICU <- read.table("./ICUAdmissions.csv", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
> str(ICU)
'data.frame': 200 obs. of 21 variables:
$ ID : int 8 12 14 28 32 38 40 41 42 50 ...
$ Status : int 0 0 0 0 0 0 0 0 0 0 ...
$ Age : int 27 59 77 54 87 69 63 30 35 70 ...
$ Sex : int 1 0 0 0 1 0 0 1 0 1 ...
$ Race : int 1 1 1 1 1 1 1 1 2 1 ...
$ Service : int 0 0 1 0 1 0 1 0 0 1 ...
$ Cancer : int 0 0 0 0 0 0 0 0 0 1 ...
$ Renal : int 0 0 0 0 0 0 0 0 0 0 ...
$ Infection : int 1 0 0 1 1 1 0 0 0 0 ...
$ CPR : int 0 0 0 0 0 0 0 0 0 0 ...
$ Systolic : int 142 112 100 142 110 110 104 144 108 138 ...
$ HeartRate : int 88 80 70 103 154 132 66 110 60 103 ...
$ Previous : int 0 1 0 0 1 0 0 0 0 0 ...
$ Type : int 1 1 0 1 1 1 0 1 1 0 ...
$ Fracture : int 0 0 0 1 0 0 0 0 0 0 ...
$ PO2 : int 0 0 0 0 0 1 0 0 0 0 ...
$ PH : int 0 0 0 0 0 0 0 0 0 0 ...
$ PCO2 : int 0 0 0 0 0 0 0 0 0 0 ...
$ Bicarbonate : int 0 0 0 0 0 1 0 0 0 0 ...
$ Creatinine : int 0 0 0 0 0 0 0 0 0 0 ...
$ Consciousness: int 1 1 1 1 1 1 1 1 1 1 ...
The ICU Admissions dataset consists of 200 observations with 21 variables. From the following observations we found;
Labelling the factor levels helps with comparative analysis and visualization
> ICU <- within(ICU, {
+ Status <- factor(Status, labels=c('Lived','Died'))
+ Sex <- factor(Sex, labels=c('Male','Female'))
+ Race <- factor(Race, labels=c('White','Black','Other'))
+ Service <- factor(Service, labels=c('Medical','Surgical'))
+ Cancer <- factor(Cancer, labels=c('No','Yes'))
+ Renal <- factor(Renal, labels=c('No','Yes'))
+ Infection <- factor(Infection, labels=c('No','Yes'))
+ CPR <- factor(CPR, labels=c('No','Yes'))
+ Previous <- factor(Previous, labels=c('No','Yes'))
+ Type <- factor(Type, labels=c('Elective','Emergency'))
+ Fracture <- factor(Fracture, labels=c('No','Yes'))
+ PCO2 <- factor(PCO2, labels=c('No','Yes'))
+ PH <- factor(PH, labels=c('No','Yes'))
+ PO2 <- factor(PO2, labels=c('No','Yes'))
+ Bicarbonate <- factor(Bicarbonate, labels=c('No','Yes'))
+ Creatinine <- factor(Creatinine, labels=c('No','Yes'))
+ Consciousness <- factor(Consciousness, labels=c('Conscious','Deep Stupor','Coma'))
+ })
> headTail(ICU) %>% datatable(rownames = TRUE, filter="top", options = list(pageLenght = 10, scrollX=T))%>% formatRound(columns=c(1:17), digits=0)
> # write.csv(ICU, file="ICUAdmissions_recoded.csv", row.names=FALSE)
> str(ICU)
'data.frame': 200 obs. of 21 variables:
$ ID : int 8 12 14 28 32 38 40 41 42 50 ...
$ Status : Factor w/ 2 levels "Lived","Died": 1 1 1 1 1 1 1 1 1 1 ...
$ Age : int 27 59 77 54 87 69 63 30 35 70 ...
$ Sex : Factor w/ 2 levels "Male","Female": 2 1 1 1 2 1 1 2 1 2 ...
$ Race : Factor w/ 3 levels "White","Black",..: 1 1 1 1 1 1 1 1 2 1 ...
$ Service : Factor w/ 2 levels "Medical","Surgical": 1 1 2 1 2 1 2 1 1 2 ...
$ Cancer : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 2 ...
$ Renal : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ Infection : Factor w/ 2 levels "No","Yes": 2 1 1 2 2 2 1 1 1 1 ...
$ CPR : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ Systolic : int 142 112 100 142 110 110 104 144 108 138 ...
$ HeartRate : int 88 80 70 103 154 132 66 110 60 103 ...
$ Previous : Factor w/ 2 levels "No","Yes": 1 2 1 1 2 1 1 1 1 1 ...
$ Type : Factor w/ 2 levels "Elective","Emergency": 2 2 1 2 2 2 1 2 2 1 ...
$ Fracture : Factor w/ 2 levels "No","Yes": 1 1 1 2 1 1 1 1 1 1 ...
$ PO2 : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 2 1 1 1 1 ...
$ PH : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ PCO2 : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ Bicarbonate : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 2 1 1 1 1 ...
$ Creatinine : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ Consciousness: Factor w/ 3 levels "Conscious","Deep Stupor",..: 1 1 1 1 1 1 1 1 1 1 ...
Findings * The following variables were successfully recoded; Status, Sex, Race, Service, Cancer, Renal, Infection, CPR, Previous, Type,Fracture, PCO2,PH, PO2, Bicarbonate, Creatinine, and Consciousness.
> summary(ICU)
ID Status Age Sex Race
Min. : 4.0 Lived:160 Min. :16.00 Male :124 White:175
1st Qu.:210.2 Died : 40 1st Qu.:46.75 Female: 76 Black: 15
Median :412.5 Median :63.00 Other: 10
Mean :444.8 Mean :57.55
3rd Qu.:671.8 3rd Qu.:72.00
Max. :929.0 Max. :92.00
Service Cancer Renal Infection CPR Systolic
Medical : 93 No :180 No :181 No :116 No :187 Min. : 36.0
Surgical:107 Yes: 20 Yes: 19 Yes: 84 Yes: 13 1st Qu.:110.0
Median :130.0
Mean :132.3
3rd Qu.:150.0
Max. :256.0
HeartRate Previous Type Fracture PO2 PH
Min. : 39.00 No :170 Elective : 53 No :185 No :184 No :187
1st Qu.: 80.00 Yes: 30 Emergency:147 Yes: 15 Yes: 16 Yes: 13
Median : 96.00
Mean : 98.92
3rd Qu.:118.25
Max. :192.00
PCO2 Bicarbonate Creatinine Consciousness
No :180 No :185 No :190 Conscious :185
Yes: 20 Yes: 15 Yes: 10 Deep Stupor: 5
Coma : 10
> numSummary(ICU[,c("Age", "HeartRate", "Systolic"), drop=FALSE], statistics=c("mean", "sd", "IQR",
+ "quantiles"), quantiles=c(0,.25,.5,.75,1))
mean sd IQR 0% 25% 50% 75% 100% n
Age 57.545 20.05465 25.25 16 46.75 63 72.00 92 200
HeartRate 98.925 26.82962 38.25 39 80.00 96 118.25 192 200
Systolic 132.280 32.95210 40.00 36 110.00 130 150.00 256 200
===========================================================================
Variable p_1 p_10 p_25 p_50 p_75 p_90 p_99
1 Systolic 55.92 92 110 130 150 170 212.12
2 Age 16.99 21 46.75 63 72 78 91
3 HeartRate 45.98 65 80 96 118.25 136.1 162.08
4 ID 11.96 81.3 210.25 412.5 671.75 829.8 924.01
Using the xray package, general trends in the dataset were identified.
There are more male observations in the dataset than females.
Many of the admissions to the ICU were emergencies, with a about a quarter of admissions being elective. This could relate to elective surgical procedures where morbidity could have been high or complications occurred. It is unclear whether people would be preemptively admitted to the ICU for high-risk procedures of if these elective admissions could be thought of as unforeseen or emergencies in themselves.
Looking at Status, more than 75% of observations lived after their ICU admission.
While the numbers are roughly even, slightly more procedures were surgical.
More admitted patients had no chronic renal failure, previous admissions to the ICU, fracture, CPR or cancer when admitted. Most admitted patients had PO2 above or equal to 60, blood pH above or equal to 7.25, PCO2 below or equal to 45, and creatinine below or equal to 2.
Most patients admitted were conscious, with slighly more patients being comatose than in a deep stupor if unconscious.
Looking at histograms of the three numerical variables in the dataset, which were Age, Systolic and HeartRate, it could be guess that if any of the distributions were to be normal, they would be Systolic and HeartRate. Age is very obviously not normally distributed. Systolic looks like it is centered around 140, and the mean confirms this at 132 with an SD of 33. HeartRate seems to be centered around 100, and the mean is calculated to be about 99 with an SD of 26.
> library(ggplot2)
> f01<-ggplot(ICU, aes(x=Sex, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Sex")
>
> f02<-ggplot(ICU, aes(x=Race, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Race")
> f03<-ggplot(ICU, aes(x=Service, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Service")
>
> f04<-ggplot(ICU, aes(x=Cancer, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Cancer")
>
> library(Rmisc)
> multiplot(f01, f02, f03, f04, layout=matrix(c(1:4), nrow=2, byrow=TRUE))
> f05<-ggplot(ICU, aes(x=Renal, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Renal")
>
> f06<-ggplot(ICU, aes(x=Infection, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Infection")
>
> f07<-ggplot(ICU, aes(x=CPR, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by CPR")
>
> f08<-ggplot(ICU, aes(x=Previous, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Previous")
>
> library(Rmisc)
> multiplot(f05, f06, f07, f08, layout=matrix(c(1:4), nrow=2, byrow=TRUE))
> f09<-ggplot(ICU, aes(x=Type, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Type")
>
> f10<-ggplot(ICU, aes(x=Fracture, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Fracture")
>
> f11<-ggplot(ICU, aes(x=PO2, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by PO2")
>
> f12<-ggplot(ICU, aes(x=PH, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by PH")
>
> library(Rmisc)
> multiplot(f09, f10, f11, f12, layout=matrix(c(1:4), nrow=2, byrow=TRUE))
> f13<-ggplot(ICU, aes(x=PCO2, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by PCO2")
>
> f14<-ggplot(ICU, aes(x=Bicarbonate, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Bicarbonate")
>
> f15<-ggplot(ICU, aes(x=Creatinine, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Creatinine")
>
> f16<-ggplot(ICU, aes(x=Consciousness, fill = Status)) +
+ theme_bw() +
+ geom_bar() +
+ labs(y = "Patient Count",
+ title = "Vital Status by Consciousness")
>
>
> library(Rmisc)
> multiplot(f13, f14, f15, f16, layout=matrix(c(1:4), nrow=2, byrow=TRUE))
> d01 <- ggplot(ICU, aes(x=Age)) +
+ geom_density(fill="green") +
+ ggtitle("Age") +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14 )) +
+ annotate("text", x=0.8, y=-0.001, label="Base=315", size=4)
>
> d02 <- ggplot(ICU, aes(x=Systolic)) +
+ geom_density(fill="green") +
+ ggtitle("Systolic") +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14 )) +
+ annotate("text", x=0.8, y=-0.001, label="Base=315", size=4)
>
> d03 <- ggplot(ICU, aes(x=HeartRate)) +
+ geom_density(fill="green") +
+ ggtitle("HeartRate") +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14 )) +
+ annotate("text", x=0.8, y=-0.001, label="Base=315", size=4)
>
> d04 <- ggplot(ICU, aes(x=ID)) +
+ geom_density(fill="green") +
+ ggtitle("ID") +
+ theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+ theme(axis.text.x=element_text(size=14 )) +
+ annotate("text", x=0.8, y=-0.001, label="Base=315", size=4)
>
> multiplot(d01, d02, d03, d04, layout=matrix(c(1:4), nrow=2, byrow=TRUE))
> n01<-ggplot(ICU, aes(x=Age, fill = Status)) +
+ theme_bw() +
+ geom_density(alpha=0.5) +
+ labs(y = "Density",
+ title = "Density distribution of Vital Status by Age")
>
> n02<-ggplot(ICU, aes(x=Systolic, fill = Status)) +
+ theme_bw() +
+ geom_density(alpha=0.5) +
+ labs(y = "Density",
+ title = "Density distribution of Vital Status Systolic")
>
> n03<-ggplot(ICU, aes(x=HeartRate, fill = Status)) +
+ theme_bw() +
+ geom_density(alpha=0.5) +
+ labs(y = "Density",
+ title = "Density distribution of Vital Status HeartRate")
>
> n04<-ggplot(ICU, aes(x=Age, fill = Status)) +
+ theme_bw() +
+ facet_wrap(~ Sex) +
+ geom_density(alpha=0.5) +
+ labs(y = "Density",
+ title = "Density distribution of Vital Status in male and female patients by Age")
>
> multiplot(n01, n02, n03, n04, layout=matrix(c(1:4), nrow=2, byrow=TRUE))
The mean of Age, referring to the descriptive statistics for this attributes, is about 58 years. Visually, it can be seen that the distribution of Age has two peaks, one around 20 years of age and one around 70 years of age. The majority of subjects admitted to the ICU seem to be between 40-80 years old.
> densityPlot( ~ Age, data=ICU, bw=bw.SJ,adjust=1, kernel=dnorm,
+ method="adaptive")
Overlaying Status and Age, density plots show that deaths after admission follow the larger Age peak that contains middle aged to elderly individuals.
> densityPlot(Age~Status, data=ICU, bw=bw.SJ, adjust=1, kernel=dnorm, method="adaptive")
While the plot of Age and death follow the same pattern for older individuals, the association between Cancer and Age is less clear. There are three peaks for of Ages where ICU admissions involved cancer. These are around 20, 50 and 70 years of age. Given the relatively small number of individuals who had cancer involvement with their admission, and the variability in ages associated, cancer might not be the best predictor of ICU admission.
> densityPlot(Age~Cancer, data=ICU, bw=bw.SJ, adjust=1, kernel=dnorm, method="adaptive")
ICU admissions that involved infection centered around 60-65 years of age. This associations is relatively clear visually looking at the plot below.
> densityPlot(Age~Infection, data=ICU, bw=bw.SJ, adjust=1, kernel=dnorm, method="adaptive")
Another association that was explored was between Age and Consiousness. All three categories for Consciousness overlap with peaks between 50 and 75 years of age. There is a distinct peak at around 50 for deep stupor, indicating that this age might be associated with deep stupor when related to ICU admissions. However, deep stupor also has a secondary peak at around 75, which make it less clearly how this consciousness category relates to age in this dataset.
> densityPlot(Age~Consciousness, data=ICU, bw=bw.SJ, adjust=1, kernel=dnorm, method="adaptive")
Both admission Types, elective and emergency, happen more frequently with advancing age. However, elective admissions happen more frequently with older age. This might suggest that older individuals are more likely to have complications during surgical or other procedures that would require admissions to an intensive care unit.
> densityPlot(Age~Type, data=ICU, bw=bw.SJ, adjust=1, kernel=dnorm, method="adaptive")
## Relationships with HeartRate
Using a density estimate plot, the distribution of HeartRate was produced. The main peak here is at about 90 beats per minute. There appears to be secondary peak around 130 beats per minute. The mean for HeartRate calculated by Rcmdr is 99. Looking at the data, this mean might be said to not accurately portray the frequency of values for HearRate.
> densityPlot( ~ HeartRate, data=ICU,
+ bw=bw.SJ, adjust=1, kernel=dnorm,
+ method="adaptive")
Both status categories have similar peaks when plotted on a density estimate plot of HeartRate. This indicates that there might not be a difference in the Status response category based on HeartRate. People who lived did seem to have a higher density of HeartRate values around 90 than those who died. A normal adult heart rate rangest between 60 and 100 beats per minute. This graph might suggest that people who lived more often had heartrates within this range, whereas those who died seemed more likely to have higher heartrates. This could inform further analysis or research.
> densityPlot(HeartRate~Status, data=ICU, bw=bw.SJ, adjust=1, kernel=dnorm,method="adaptive")
Below is a density plot of Systolic blood pressure in mmHg. A peak appears around 130-140mmHg. The peak is quite narrow and distinct compared to the density plots of Age and HeartRate, suggesting that the majority of individuals had blood pressure readings that are reflective of this plot. The mean for this attribute was 132mmHg, which seems to be more valid than the means of the other numeric variables, based on the distribution of the frequency of values.
> densityPlot( ~ Systolic, data=ICU,
+ bw=bw.SJ, adjust=1, kernel=dnorm,
+ method="adaptive")
The distribution of Systolic values for those who died seem to be concentrated around 75 mmHg and 140mmHg. A normal Systolic blood pressure can vary widely but the American Heart Association states that blood pressure below 120mmHg is normal. However, excessively low blood pressure could be a result of bleeding, for example, and can result in insufficient blood flow to critical organs. Medications used to restore blood pressure are used when blood pressure becomes too low. This might explain the peak around 75mmHg in the death curve in the graph below.
> densityPlot(Systolic~Status, data=ICU, bw=bw.SJ, adjust=1, kernel=dnorm,method="adaptive")
> with(ICU, qqPlot(Systolic, dist="norm", id=list(method="y", n=2,
+ labels=rownames(ICU)), main="Systolic"))
[1] 200 179
> normalityTest(~Systolic, test="shapiro.test", data=ICU)
Shapiro-Wilk normality test
data: Systolic
W = 0.98369, p-value = 0.0204
> with(ICU, qqPlot(HeartRate, dist="norm", id=list(method="y", n=2,
+ labels=rownames(ICU)), main="Heartrate"))
[1] 125 48
> normalityTest(~HeartRate, test="shapiro.test", data=ICU)
Shapiro-Wilk normality test
data: HeartRate
W = 0.98598, p-value = 0.04478
> with(ICU, qqPlot(Age, dist="norm", id=list(method="y", n=2,
+ labels=rownames(ICU)), main="Age"))
[1] 23 97
> normalityTest(~Age, test="shapiro.test", data=ICU)
Shapiro-Wilk normality test
data: Age
W = 0.92836, p-value = 2.507e-08
The distribution of the ICU attributes will be explored below. Many of the attributes chosen to be tested here relate to the history of the individuals in the data set or the circumstances of the admission, rather than lab tests on admission. It would be interesting to explore Most of the categorical attributes in this dataset have two categories, making the degrees of freedom 1 for chi squared distribution.
Theoretical chi square distribution with 1 degrees of freedom
For Categorical variables with two categories:
Null Hypothesis: the attribute is distributed uniformally, the chi squared value does not exceed the hypothesized value of 3.841 (p = 0.05)
Alternative Hypothesis: the attribute is not distiributed uniformally, The chi squared value exceeds the hypothesized value of 3.841
Chi squared test - Status:
The chi squared test for Status returns a chi squared value above the null hypothesized value with a very small p value (<0.05), indicating that there is a very low risk of Type I error if the null hypothesis is rejected. It can be concluded that Status is not uniformally distributed and that statistically more individuals lived than died on admission to the ICU.
> local({
+ .Table <- with(ICU, table(Status))
+ cat("\ncounts:\n")
+ print(.Table)
+ cat("\npercentages:\n")
+ print(round(100*.Table/sum(.Table),
+ 2))
+ .Probs <- c(0.5,0.5)
+ chisq.test(.Table, p=.Probs)
+ })
counts:
Status
Lived Died
160 40
percentages:
Status
Lived Died
80 20
Chi-squared test for given probabilities
data: .Table
X-squared = 72, df = 1, p-value < 2.2e-16
Chi squared test - CPR:
Similarly, the chi squared test for CPR returns a very high chi squared value at 151, and a low p value, allowing us to reject the null hypothesis that this attribute is uniformally distributed. More individuals did not have CPR than did before ICU admission. In fact, as seen from the table below, few individuals received CPR at all, which might make it a poor predictor of ICU outcomes.
> local({
+ .Table <- with(ICU, table(CPR))
+ cat("\ncounts:\n")
+ print(.Table)
+ cat("\npercentages:\n")
+ print(round(100*.Table/sum(.Table),
+ 2))
+ .Probs <- c(0.5,0.5)
+ chisq.test(.Table, p=.Probs)
+ })
counts:
CPR
No Yes
187 13
percentages:
CPR
No Yes
93.5 6.5
Chi-squared test for given probabilities
data: .Table
X-squared = 151.38, df = 1, p-value < 2.2e-16
> table(ICU$CPR, ICU$Status)
Lived Died
No 154 33
Yes 6 7
Chi squared test - Infection:
The chi squared value produced from this test is only slightly higher than the null hypothesized value, with a p value of 0.024, indicating a 2.4% chance of a Type I error if the null hypothesis is rejected. While the distribution can be concluded to not be uniform based on the threshold set with the null hypothesis, the uniformity of Infection among patients admitted to the ICU could be explored in further research. It should be noted that in the group of individuals who died, more had infections than not.
> local({
+ .Table <- with(ICU, table(Infection))
+ cat("\ncounts:\n")
+ print(.Table)
+ cat("\npercentages:\n")
+ print(round(100*.Table/sum(.Table),
+ 2))
+ .Probs <- c(0.5,0.5)
+ chisq.test(.Table, p=.Probs)
+ })
counts:
Infection
No Yes
116 84
percentages:
Infection
No Yes
58 42
Chi-squared test for given probabilities
data: .Table
X-squared = 5.12, df = 1, p-value = 0.02365
> table(ICU$Infection, ICU$Status)
Lived Died
No 100 16
Yes 60 24
Chi squared test - Previous:
The chi squared value here exceeds the null hypothesized value with a p value much smaller than 0.05, allowing us to reject the null hypothesis that this attribute is uniformally distributed. More individuals in both status categories did not have a previous ICU admission.
> local({
+ .Table <- with(ICU, table(Previous))
+ cat("\ncounts:\n")
+ print(.Table)
+ cat("\npercentages:\n")
+ print(round(100*.Table/sum(.Table),
+ 2))
+ .Probs <- c(0.5,0.5)
+ chisq.test(.Table, p=.Probs)
+ })
counts:
Previous
No Yes
170 30
percentages:
Previous
No Yes
85 15
Chi-squared test for given probabilities
data: .Table
X-squared = 98, df = 1, p-value < 2.2e-16
Chi squared test - Sex:
The chi squared value returned from this test is 11.52, which is not as far from the null hypothesized value as some of the other values generated from other tests. The value and p value of 0.00069 still indicated that the null hypothesis should be reject and that Sex is not uniformally distributed. There are more men in this dataset than females.
> local({
+ .Table <- with(ICU, table(Sex))
+ cat("\ncounts:\n")
+ print(.Table)
+ cat("\npercentages:\n")
+ print(round(100*.Table/sum(.Table),
+ 2))
+ .Probs <- c(0.5,0.5)
+ chisq.test(.Table, p=.Probs)
+ })
counts:
Sex
Male Female
124 76
percentages:
Sex
Male Female
62 38
Chi-squared test for given probabilities
data: .Table
X-squared = 11.52, df = 1, p-value = 0.0006885
In both sex categories, despite the inequality in the number of observations in each group, there are still more individuals who lived than died.
> with(ICU, Barplot(Sex, by=Status,
+ style="divided", legend.pos="above",
+ xlab="Sex", ylab="Frequency"))
Chi squared test - Type:
The distribution of Type can be concluded to be non-uniform as the null hypothesis should be rejected based on the chi squared value produced and low p value. As seen in exploratory data analysis, the there are more individuals who are had emergency admissions than elective admissions. This difference in distribution is statistically significant. Fewer individuals died when they were admitted to the ICU on an elective basis.
> local({
+ .Table <- with(ICU, table(Type))
+ cat("\ncounts:\n")
+ print(.Table)
+ cat("\npercentages:\n")
+ print(round(100*.Table/sum(.Table),
+ 2))
+ .Probs <- c(0.5,0.5)
+ chisq.test(.Table, p=.Probs)
+ })
counts:
Type
Elective Emergency
53 147
percentages:
Type
Elective Emergency
26.5 73.5
Chi-squared test for given probabilities
data: .Table
X-squared = 44.18, df = 1, p-value = 2.995e-11
> with(ICU, Barplot(Type, by=Status,
+ style="divided", legend.pos="above",
+ xlab="Type", ylab="Frequency"))
For Categorical variables with three categories:
Null Hypothesis: the attribute is distributed uniformally, the chi squared value does not exceed the hypothesized value of 5.992 (p = 0.05)
Alternative Hypothesis: the attribute is not distiributed uniformally, The chi squared value exceeds the hypothesized value of 5.992 (p=0.05)
Theoretical chi square distribution with 2 degrees of freedom
Chi squared test - Consciousness
The null hypothesis for this attribute’s distribution is that an equal number of observations will be found in each category of Consciousness. The chi squared value produced by this test exceeds the null hypothesized value with a low p value, indicating that the observations in this category are statistically not uniform. This could have been guessed from counts of observations in each category, where most individuals were in the conscious category.
> local({
+ .Table <- with(ICU,
+ table(Consciousness))
+ cat("\ncounts:\n")
+ print(.Table)
+ cat("\npercentages:\n")
+ print(round(100*.Table/sum(.Table),
+ 2))
+ .Probs <- c(0.333333333333333,
+ 0.333333333333333,0.333333333333333)
+ chisq.test(.Table, p=.Probs)
+ })
counts:
Consciousness
Conscious Deep Stupor Coma
185 5 10
percentages:
Consciousness
Conscious Deep Stupor Coma
92.5 2.5 5.0
Chi-squared test for given probabilities
data: .Table
X-squared = 315.25, df = 2, p-value < 2.2e-16
Distribution of the three numerical variables in the ICU dataset was evaluated using quantile-quantile plots and the Shapiro-Wilk’s test for normality. Confidence intervals will be produced for the means of each attribute.
Null Hypothesis: Age, HeartRate and Systolic are normally distributed
Alternative Hypothesis: Age, HeartRate and Systolic are not normally distributed
Tests of normality - Age:
Recall that a quantile-quantile plot should produce a nearly linear plot using dataset values, with the intercept going through zero, if the null hypothesis is satisfied. Below, the plotted points of Age do not conform well to the line of best fit, and are frequently outside of the confidence bands. This supports what might have been hypothesized when the histogram of Age was produced: this distribution is not clearly centered around a mean. This plot suggests that the null hypothesis that Age is normally distributed should be rejected.
> with(ICU, qqPlot(Age, dist="norm", id=list(method="y", n=2, labels=rownames(ICU)), main="QQ plot of Age"))
[1] 23 97
Using the Shapiro-Wilk normality test, the conclusions drawn from the QQ plot can be supported, with a very low risk of Type I error, base on this p value. The Shapiro Wilk test is sensitive when used on larger datasets, so this result should be taken into account with the other tests used.
> normalityTest(~Age, test="shapiro.test", data=ICU)
Shapiro-Wilk normality test
data: Age
W = 0.92836, p-value = 2.507e-08
Using the t-test, given the confidence interval for this attribute is between 54.75 and 60.34, an interval that does not contain 0. (choosing not to add this as I do not have a test mean - need to refresh on what a true mean is )
Tests of normality - HeartRate:
The QQ plot of HeartRate appears to adhere well to the line of best fit, despite several points in the middle of the graph lying along one side of the confidence band. Most of the points here fall within the confidence band. There is a positive skew of this data towards the lower values of HeartRate. Visually, one might conclude that this data is normally distributed.
> with(ICU, qqPlot(HeartRate, dist="norm", id=list(method="y", n=2, labels=rownames(ICU)), main="QQ plot of HeartRate"))
[1] 125 48
The results of the Shapiro-Wilk test very narrowly allow one to reject the null hypothesis of normal distribution. However the p value is very close to the threshold p value of 0.05, which decreases the confidence that might be had to reject normality based on this test and the sample size used.
> normalityTest(~HeartRate, test="shapiro.test", data=ICU)
Shapiro-Wilk normality test
data: HeartRate
W = 0.98598, p-value = 0.04478
Test of normality - Systolic:
Similar to the plot of HeartRate, the QQ plot of Systolic displays points that line closely on the line of best fit. The confidence bands here are quite narrow, and it is only along the ends oof the plot the points very clearly exit the confidence bands. One might conclude that the null hypothesis could be rejected for the normal distribution of Systolic based on this plot. There is a very slight negative skew here towards higher values of Systolic.
> with(ICU, qqPlot(Systolic, dist="norm", id=list(method="y", n=2, labels=rownames(ICU)), main="QQ Plot of Systolic"))
[1] 200 179
While more statistically significant compared to the Shapiro Wilk test for normality of the HeartRate distribution, the p value here is not far from 0.05, indicating a 2% risk of Type I error. Based on this test, the null hypothesis could be rejected. The distribution of Systolic appears to be non-normal.
> normalityTest(~Systolic, test="shapiro.test", data=ICU)
Shapiro-Wilk normality test
data: Systolic
W = 0.98369, p-value = 0.0204
In reference to Ian Fellows’ comments about tests for normality, these tests for normality do not add much more to understanding of the ICU dataset and the contributions of these three numerical variables to the determination of Status, beyond what is provided by the xray package.
> numSummary(ICU[,c("Age", "Systolic", "HeartRate"), drop=FALSE], groups=ICU$Status, statistics=c("mean", "sd", "se(mean)"),quantiles=c(0,.25, .5, .75,1))
Variable: Age
mean sd se(mean) n
Lived 55.650 20.42818 1.614990 160
Died 65.125 16.64900 2.632438 40
Variable: Systolic
mean sd se(mean) n
Lived 135.6438 29.80151 2.356016 160
Died 118.8250 41.08084 6.495451 40
Variable: HeartRate
mean sd se(mean) n
Lived 98.500 26.97868 2.132852 160
Died 100.625 26.49304 4.188918 40
> library(psych)
> describeBy(ICU, ICU$Status)
Descriptive statistics by group
group: Lived
vars n mean sd median trimmed mad min max range
ID 1 160 457.04 276.35 438.5 454.25 346.19 8 929 921
Status* 2 160 1.00 0.00 1.0 1.00 0.00 1 1 0
Age 3 160 55.65 20.43 61.0 56.86 19.27 16 91 75
Sex* 4 160 1.38 0.49 1.0 1.34 0.00 1 2 1
Race* 5 160 1.19 0.50 1.0 1.05 0.00 1 3 2
Service* 6 160 1.58 0.49 2.0 1.60 0.00 1 2 1
Cancer* 7 160 1.10 0.30 1.0 1.00 0.00 1 2 1
Renal* 8 160 1.07 0.25 1.0 1.00 0.00 1 2 1
Infection* 9 160 1.38 0.49 1.0 1.34 0.00 1 2 1
CPR* 10 160 1.04 0.19 1.0 1.00 0.00 1 2 1
Systolic 11 160 135.64 29.80 132.0 133.97 29.65 48 224 176
HeartRate 12 160 98.50 26.98 95.0 97.48 25.20 39 192 153
Previous* 13 160 1.14 0.35 1.0 1.05 0.00 1 2 1
Type* 14 160 1.68 0.47 2.0 1.73 0.00 1 2 1
Fracture* 15 160 1.07 0.26 1.0 1.00 0.00 1 2 1
PO2* 16 160 1.07 0.25 1.0 1.00 0.00 1 2 1
PH* 17 160 1.06 0.23 1.0 1.00 0.00 1 2 1
PCO2* 18 160 1.10 0.30 1.0 1.00 0.00 1 2 1
Bicarbonate* 19 160 1.06 0.24 1.0 1.00 0.00 1 2 1
Creatinine* 20 160 1.03 0.17 1.0 1.00 0.00 1 2 1
Consciousness* 21 160 1.02 0.22 1.0 1.00 0.00 1 3 2
skew kurtosis se
ID 0.07 -1.25 21.85
Status* NaN NaN 0.00
Age -0.55 -0.82 1.61
Sex* 0.51 -1.75 0.04
Race* 2.65 5.98 0.04
Service* -0.33 -1.91 0.04
Cancer* 2.64 5.01 0.02
Renal* 3.38 9.46 0.02
Infection* 0.51 -1.75 0.04
CPR* 4.82 21.40 0.02
Systolic 0.42 0.34 2.36
HeartRate 0.44 0.17 2.13
Previous* 2.01 2.06 0.03
Type* -0.77 -1.41 0.04
Fracture* 3.20 8.27 0.02
PO2* 3.38 9.46 0.02
PH* 3.82 12.64 0.02
PCO2* 2.64 5.01 0.02
Bicarbonate* 3.58 10.89 0.02
Creatinine* 5.34 26.66 0.01
Consciousness* 8.69 74.04 0.02
--------------------------------------------------------
group: Died
vars n mean sd median trimmed mad min max range
ID 1 40 395.95 250.74 363 386.72 243.89 4 921 917
Status* 2 40 2.00 0.00 2 2.00 0.00 2 2 0
Age 3 40 65.12 16.65 68 66.47 11.86 19 92 73
Sex* 4 40 1.40 0.50 1 1.38 0.00 1 2 1
Race* 5 40 1.12 0.46 1 1.00 0.00 1 3 2
Service* 6 40 1.35 0.48 1 1.31 0.00 1 2 1
Cancer* 7 40 1.10 0.30 1 1.00 0.00 1 2 1
Renal* 8 40 1.20 0.41 1 1.12 0.00 1 2 1
Infection* 9 40 1.60 0.50 2 1.62 0.00 1 2 1
CPR* 10 40 1.18 0.38 1 1.09 0.00 1 2 1
Systolic 11 40 118.83 41.08 126 117.22 32.62 36 256 220
HeartRate 12 40 100.62 26.49 96 99.78 25.20 55 160 105
Previous* 13 40 1.18 0.38 1 1.09 0.00 1 2 1
Type* 14 40 1.95 0.22 2 2.00 0.00 1 2 1
Fracture* 15 40 1.07 0.27 1 1.00 0.00 1 2 1
PO2* 16 40 1.12 0.33 1 1.03 0.00 1 2 1
PH* 17 40 1.10 0.30 1 1.00 0.00 1 2 1
PCO2* 18 40 1.10 0.30 1 1.00 0.00 1 2 1
Bicarbonate* 19 40 1.12 0.33 1 1.03 0.00 1 2 1
Creatinine* 20 40 1.12 0.33 1 1.03 0.00 1 2 1
Consciousness* 21 40 1.52 0.82 1 1.41 0.00 1 3 2
skew kurtosis se
ID 0.37 -1.00 39.65
Status* NaN NaN 0.00
Age -0.84 0.73 2.63
Sex* 0.39 -1.89 0.08
Race* 3.46 10.72 0.07
Service* 0.61 -1.67 0.08
Cancer* 2.57 4.71 0.05
Renal* 1.44 0.09 0.06
Infection* -0.39 -1.89 0.08
CPR* 1.65 0.73 0.06
Systolic 0.60 1.40 6.50
HeartRate 0.28 -0.75 4.19
Previous* 1.65 0.73 0.06
Type* -3.98 14.16 0.03
Fracture* 3.11 7.85 0.04
PO2* 2.18 2.84 0.05
PH* 2.57 4.71 0.05
PCO2* 2.57 4.71 0.05
Bicarbonate* 2.18 2.84 0.05
Creatinine* 2.18 2.84 0.05
Consciousness* 1.03 -0.74 0.13
Ho: the variances for Lived and Died are equal
Ha: the variances are different
Ho: the means are equal
Ha: the means are different
The following is the comparison of variances between the two Status groups for Age.
> with(ICU, tapply(Age, Status, var, na.rm=TRUE))
Lived Died
417.3107 277.1891
The following code produces the result of the LeveneTest for testing homogeneity of variances.
> leveneTest(Age ~ Status, data=ICU, center="median")
Levene's Test for Homogeneity of Variance (center = "median")
Df F value Pr(>F)
group 1 3.127 0.07855 .
198
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the p value = 0.07855 for testing the homogeneity of variances is greater than 0.05, we retain the null hypothesis with a 5% risk of a type 1 error and conclude that the variances for Lived and Died are equal. As such, the Student t-test is used to analyze whether there was a significant difference in means.
> t.test(Age~Status, alternative='two.sided', conf.level=.95, var.equal=TRUE, data=ICU)
Two Sample t-test
data: Age by Status
t = -2.7151, df = 198, p-value = 0.007211
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-16.35688 -2.59312
sample estimates:
mean in group Lived mean in group Died
55.650 65.125
> qt(c(0.025), df=314, lower.tail=TRUE)
[1] -1.967548
As the p-value = 0.007211, 0 is not within the confidence intervals of -16.35688 to -2.59312 and t = -2.7151 is less than -1.967548, we reject the null hypothesis and conclude that the means for Age for the groups Lived and Died are not the same.
Ho: the variances for Lived and Died are equal
Ha: the variances are different
Ho: the means are equal
Ha: the means are different
The following is the comparison of variances between the two Status groups for HeartRate.
> with(ICU, tapply(HeartRate, Status, var, na.rm=TRUE))
Lived Died
727.8491 701.8814
The following code produces the result of the LeveneTest for testing homogeneity of variances.
> leveneTest(HeartRate ~ Status, data=ICU, center="median")
Levene's Test for Homogeneity of Variance (center = "median")
Df F value Pr(>F)
group 1 0.008 0.929
198
Since the p-value = 0.929 for testing the homogeneity of variances is greater than 0.05, we retain the null hypothesis with a 5% risk of a type 1 error and conclude that the variances for Lived and Died are equal. As such, the Student t-test is used to analyze whether there was a significant difference in means.
> t.test(HeartRate~Status, alternative='two.sided', conf.level=.95, var.equal=TRUE, data=ICU)
Two Sample t-test
data: HeartRate by Status
t = -0.44714, df = 198, p-value = 0.6553
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11.496845 7.246845
sample estimates:
mean in group Lived mean in group Died
98.500 100.625
> qt(c(0.025), df=314, lower.tail=TRUE)
[1] -1.967548
As the p-value = 0.6553, 0 is within the confidence intervals of -11.496845 to 7.246845 and t = -0.44714 is greater than -1.967548, we retain the null hypothesis at a 5% risk level of a type 1 error and conclude that the means for HeartRate are the same among those that lived and those that died.
Ho: the variances for Lived and Died are equal
Ha: the variances are different
Ho: the means are equal
Ha: the means are different
The following is the comparison of variances between the two Status groups for Systolic.
> with(ICU, tapply(Systolic, Status, var, na.rm=TRUE))
Lived Died
888.1301 1687.6353
The following code produces the result of the LeveneTest for testing homogeneity of variances.
> leveneTest(Systolic ~ Status, data=ICU, center="median")
Levene's Test for Homogeneity of Variance (center = "median")
Df F value Pr(>F)
group 1 4.1872 0.04205 *
198
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the p-value = 0.04205 for testing the homogeneity of variances is less than 0.05, we reject the null hypothesis with a 5% risk of a type 1 error and conclude that the variances for Lived and Died are not equal. As such, the Welch two Sample t-test is used to analyze whether there was a significant difference in means.
> t.test(Systolic~Status, alternative="two.sided", conf.level=.95, var.equal=FALSE, data=ICU)
Welch Two Sample t-test
data: Systolic by Status
t = 2.4341, df = 49.726, p-value = 0.01856
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
2.938642 30.698858
sample estimates:
mean in group Lived mean in group Died
135.6438 118.8250
> qt(c(0.025), df=314, lower.tail=TRUE)
[1] -1.967548
As the p-value = 0.01856, 0 is not within the confidence intervals of 2.938642 to 30.698858 and t = 2.4341 is greater than 1.9675, we reject the null hypothesis at a 5% risk level of a type 1 error and conclude that the means for systolic blood pressure are not the same among those that lived and those that died.
Ho: the variances for Lived and Died are equal
Ha: the variances are different
Ho: the means are equal
Ha: the means are different
The following is the comparison of variances between the two Sex groups for Age.
> with(ICU, tapply(Age, Sex, var, na.rm=TRUE))
Male Female
378.4780 436.5867
The following code produces the result of the LeveneTest for testing homogeneity of variances.
> leveneTest(Age ~ Sex, data=ICU, center="median")
Levene's Test for Homogeneity of Variance (center = "median")
Df F value Pr(>F)
group 1 0.1154 0.7344
198
Since the p-value = 0.7344 for testing the homogeneity of variances is greater than 0.05, we retain the null hypothesis with a 5% risk of a type 1 error and conclude that the variances for Lived and Died are equal. As such, the Student t-test is used to analyze whether there was a significant difference in means.
> t.test(Age~Sex, alternative='two.sided', conf.level=.95, var.equal=TRUE, data=ICU)
Two Sample t-test
data: Age by Sex
t = -1.3582, df = 198, p-value = 0.1759
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-9.708824 1.789469
sample estimates:
mean in group Male mean in group Female
56.04032 60.00000
> qt(c(0.025), df=314, lower.tail=TRUE)
[1] -1.967548
As the p-value = 0.1759, 0 is within the confidence intervals of -9.708824 to 1.789469 and t = -1.3582 is greater than - 1.9675, we retain the null hypothesis at a 5% risk level of a type 1 error and conclude that the means for systolic blood pressure are the same between the two groups.
We will conduct z-tests between two categorical variables where the dependent variable is Vital status and the independent variable is binary.
Vital status by Service
Ho: there is no difference between the proportion of patients that died in the medical versus surgical group, risk = 0.05 Ha: there is a difference between the proportion of patients that died in the medical versus surgical group
> pacman::p_load(sjPlot)
> sjt.xtab(ICU$Status, ICU$Service, show.col.prc = TRUE)
| Status | Service | Total | |
|---|---|---|---|
| Medical | Surgical | ||
| Lived |
67 72 % |
93 86.9 % |
160 80 % |
| Died |
26 28 % |
14 13.1 % |
40 20 % |
| Total |
93 100 % |
107 100 % |
200 100 % |
χ2=5.981 · df=1 · φ=0.185 · p=0.014 |
> x1<-26; x2<-14; n1<-93; n2<-107
> p1<-x1/n1; p2<-x2/n2
> p<-(x1+x2)/(n1+n2)
> varp<-p*(1-p)*(1/n1 + 1/n2)
> stdp<-sqrt(varp)
> zp<-(p1 - p2)/stdp
> zp
[1] 2.622729
> 1-pnorm(zp)
[1] 0.004361436
When conducting a two-tail z-test with a 5% level of risk of a type 1 error, the critical values are -1.96 and +1.96. The z-value of 2.622729 does not fall within this confidence interval, thus we reject the null hypothesis that there is no difference between the proportion of patients dying in the ICU from medical procedures versus surgical procedures.
Further, since the p-value of 0.004361436 is less than 0.05, we, again, reject the null hypothesis that the proportions of the two groups are equal.
Lastly, we can see from the table that more of those that died were there for a medical service and more of those that lived were there for a surgical service.
Vital status by Cancer
Ho: there is no difference between the proportions of patients that died in the cancer versus non-cancer group, risk = 0.05 Ha: there is a difference between the proportions of patients that died in the cancer versus non-cancer group
> pacman::p_load(sjPlot)
> sjt.xtab(ICU$Status, ICU$Cancer, show.col.prc = TRUE)
| Status | Cancer | Total | |
|---|---|---|---|
| No | Yes | ||
| Lived |
144 80 % |
16 80 % |
160 80 % |
| Died |
36 20 % |
4 20 % |
40 20 % |
| Total |
180 100 % |
20 100 % |
200 100 % |
χ2=0.000 · df=1 · φ=0.000 · Fisher’s p=1.000 |
> x1<-36; x2<-4; n1<-180; n2<-20
> p1<-x1/n1; p2<-x2/n2
> p<-(x1+x2)/(n1+n2)
> varp<-p*(1-p)*(1/n1 + 1/n2)
> stdp<-sqrt(varp)
> zp<-(p1 - p2)/stdp
> zp
[1] 0
> 1-pnorm(zp)
[1] 0.5
When conducting a two-tail z-test with a 5% level of risk of a type 1 error, the critical values are -1.96 and +1.96. The z-value of 0 falls within this confidence interval, thus we fail to reject the null hypothesis that there is no difference between the proportions of patients that died with versus without cancer as part of the present problem.
Further, since the p-value of 0.5 is greater than 0.05, we, again, fail to reject the null hypothesis that the proportions of the Cancer versus non-cancer patients that died are equal.
Thus, there are equal proportions of those who died with cancer as part of the present problem and without, as well as equal proportions of those who lived with cancer as part of the present problem and without.
Vital status by Previous
Ho: there is no difference between the proportion of patients that died with versus without previous admission to an ICU within 6 months, risk = 0.05 Ha: there is a difference between the proportion of patients that died with versus without previous admission to an ICU within 6 months
> pacman::p_load(sjPlot)
> sjt.xtab(ICU$Status, ICU$Previous, show.col.prc = TRUE)
| Status | Previous | Total | |
|---|---|---|---|
| No | Yes | ||
| Lived |
137 80.6 % |
23 76.7 % |
160 80 % |
| Died |
33 19.4 % |
7 23.3 % |
40 20 % |
| Total |
170 100 % |
30 100 % |
200 100 % |
χ2=0.061 · df=1 · φ=0.035 · Fisher’s p=0.624 |
> x1<-33; x2<-7; n1<-170; n2<-30
> p1<-x1/n1; p2<-x2/n2
> p<-(x1+x2)/(n1+n2)
> varp<-p*(1-p)*(1/n1 + 1/n2)
> stdp<-sqrt(varp)
> zp<-(p1 - p2)/stdp
> zp
[1] -0.4950738
> 1-pnorm(zp)
[1] 0.689726
When conducting a two-tail z-test with a 5% level of risk of a type 1 error, the critical values are -1.96 and +1.96. The z-value of -0.4950738 falls within this confidence interval, thus we fail to reject the null hypothesis that there is no difference between the proportion of patients that died with versus without previous admission to an ICU within 6 months of the current admission.
Further, since the p-value of 0.689726 is greater than 0.05, we, again, fail to reject the null hypothesis that there is no difference between the proportion of patients that died whether or not they were previously admitted to the ICU within the past 6 months.
Lastly, we see from the table that there was a lower percentage of patients that died with no prior ICU admission than patients that died with prior ICU admission.
Vital status by Type
Ho: there is no difference between the proportion of patients that died in the elective admission group versus the emergency admission group, risk = 0.05 Ha: there is a difference between the proportion of patients that died in the elective admission group versus the emergency admission group
> pacman::p_load(sjPlot)
> sjt.xtab(ICU$Status, ICU$Type, show.col.prc = TRUE)
| Status | Type | Total | |
|---|---|---|---|
| Elective | Emergency | ||
| Lived |
51 96.2 % |
109 74.1 % |
160 80 % |
| Died |
2 3.8 % |
38 25.9 % |
40 20 % |
| Total |
53 100 % |
147 100 % |
200 100 % |
χ2=10.527 · df=1 · φ=0.244 · p=0.001 |
> x1<-2; x2<-38; n1<-53; n2<-147
> p1<-x1/n1; p2<-x2/n2
> p<-(x1+x2)/(n1+n2)
> varp<-p*(1-p)*(1/n1 + 1/n2)
> stdp<-sqrt(varp)
> zp<-(p1 - p2)/stdp
> zp
[1] -3.444743
> pnorm(zp)
[1] 0.000285801
When conducting a two-tail z-test with a 5% level of risk of a type 1 error, the critical values are -1.96 and +1.96. The z-value of -3.444743 does not fall within this confidence interval, thus we reject the null hypothesis that there is no difference between the proportion of patients that died in the elective admission group versus the emergency admission group.
Further, since the p-value of 0.000285801 is less than 0.05, we, again, reject the null hypothesis that the proportions of the two groups are equal.
We can see from the table that the emergency admission type had a higher proportion of deaths than the elective admission type did.
We will now conduct a prop.test for categorical variables where the independent variable has more than 2 subgroups.
Vital status by Age
Ho: there is no difference between the proportion of patients that died across the 5 age groups, risk = 0.05 Ha: the proportion of patients that died is different in at least one of the 5 age groups
Binning the Age variable with equal-count bins.
> ICU$Age.binned <-
+ with(ICU, binVariable(Age,
+ bins=5, method='proportions',
+ labels=c('Group 1','Group 2','Group 3',
+ 'Group 4','Group 5')))
> pacman::p_load(sjPlot)
> sjt.xtab(ICU$Status, ICU$Age.binned, show.col.prc = TRUE)
| Status | Age.binned | Total | ||||
|---|---|---|---|---|---|---|
| Group 1 | Group 2 | Group 3 | Group 4 | Group 5 | ||
| Lived |
38 92.7 % |
31 79.5 % |
34 79.1 % |
33 75 % |
24 72.7 % |
160 80 % |
| Died |
3 7.3 % |
8 20.5 % |
9 20.9 % |
11 25 % |
9 27.3 % |
40 20 % |
| Total |
41 100 % |
39 100 % |
43 100 % |
44 100 % |
33 100 % |
200 100 % |
χ2=5.930 · df=4 · Cramer’s V=0.172 · p=0.204 |
> Died <- c( 3, 8, 9, 11, 9 )
> Total <- c( 41, 39, 43, 44, 33 )
> prop.test(Died, Total)
5-sample test for equality of proportions without continuity
correction
data: Died out of Total
X-squared = 5.93, df = 4, p-value = 0.2044
alternative hypothesis: two.sided
sample estimates:
prop 1 prop 2 prop 3 prop 4 prop 5
0.07317073 0.20512821 0.20930233 0.25000000 0.27272727
Since the p-value of 0.2044 is greater than 0.05, we fail to reject the null hypothesis that there is no difference between the proportion of patients that died across the 5 age groups. We can see from the sample estimates that the proportions are indeed quite similar between the 5 age groups. However, the number of patients that died in age group 1 (the youngest age group) is much lower than the rest. The greatest proportion of patients that died is in group 5 (the oldest age group). The proportions in groups 2 and 3 are quite similar and slightly higher in group 4. However, overall, most of the groups had similar proportions of patients that died, aside from the youngest age group.
Further, when conducting a chi-square test with a 5% level of risk of a type 1 error and 4 degrees of freedom, the critical value is 9.49. Since the chi-squared value of 5.93 is less than this critical value, we, again, fail to reject the null hypothesis that there is no difference between the proportion of patients that died across the 5 age groups.
Vital status by Systolic Blood Pressure
Ho: there is no difference between the proportion of patients that died across the 6 systolic blood pressure level groups, risk = 0.05 Ha: the proportion of patients that died is different in at least one of the 6 systolic blood pressure level groups
> ICU <-
+ within(ICU, {
+ Systolic.grouped <- Recode(Systolic,
+ '0:80="hypotension"; 80:120="normal"; 120:129="elevated"; 130:139="stage 1 hypertension"; 140:180="stage 2 hypertension"; 180:260="stage 3 hypertension"; ;',
+ as.factor=TRUE)
+ })
> pacman::p_load(sjPlot)
> sjt.xtab(ICU$Status, ICU$Systolic.grouped, show.col.prc = TRUE)
| Status | Systolic.grouped | Total | |||||
|---|---|---|---|---|---|---|---|
| elevated | hypotension | normal | stage 1 hypertension | stage 2 hypertension | stage 3 hypertension | ||
| Lived |
14 82.4 % |
3 25 % |
52 85.2 % |
24 80 % |
54 83.1 % |
13 86.7 % |
160 80 % |
| Died |
3 17.6 % |
9 75 % |
9 14.8 % |
6 20 % |
11 16.9 % |
2 13.3 % |
40 20 % |
| Total |
17 100 % |
12 100 % |
61 100 % |
30 100 % |
65 100 % |
15 100 % |
200 100 % |
χ2=24.597 · df=5 · Cramer’s V=0.351 · Fisher’s p=0.002 |
> Died <- c( 3, 9, 9, 6, 11, 2 )
> Total <- c( 17, 12, 61, 30, 65, 15 )
> prop.test(Died, Total)
6-sample test for equality of proportions without continuity
correction
data: Died out of Total
X-squared = 24.597, df = 5, p-value = 0.0001667
alternative hypothesis: two.sided
sample estimates:
prop 1 prop 2 prop 3 prop 4 prop 5 prop 6
0.1764706 0.7500000 0.1475410 0.2000000 0.1692308 0.1333333
Since the p-value of 0.0001667 is less than 0.05, we will reject the null hypothesis that there is no difference between the proportion of patients that died in the 6 systolic blood pressure level groups. We can see from the sample estimates that the proportions are indeed different between the 6 groups, with the highest proportion in the hypotenstion range and in the stage 1 hypertension range. The proportions in the elevated, normal, stage 2 hypertension and stage 3 hypertension groups are much lower and are quite close to each other. Thus, the hypotension and stage 1 hypertension groups had higher proportions of patients that died.
Further, when conducting a chi-square test with a 5% level of risk of a type 1 error and 5 degrees of freedom, the critical value is 11.07. Since the chi-squared value of 24.597 is greater than this critical value, we will, again, reject the null hypothesis that there is no difference between the proportion of patients that died in the 6 systolic blood pressure level groups.
Vital status by Heart Rate
Ho: there is no difference between the proportion of patients that died across the bradychardia, normal heart rate and elevated heart rate groups, risk = 0.05 Ha: the proportion of patients that died is different in at least one of the 3 heart rate categories
> ICU <-
+ within(ICU, {
+ HeartRate.grouped <- Recode(HeartRate,
+ '0:60="bradycardia"; 60:100="normal"; 100:200="elevated"',
+ as.factor=TRUE)
+ })
> pacman::p_load(sjPlot)
> sjt.xtab(ICU$Status, ICU$HeartRate.grouped, show.col.prc = TRUE)
| Status | HeartRate.grouped | Total | ||
|---|---|---|---|---|
| bradycardia | elevated | normal | ||
| Lived |
12 80 % |
64 80 % |
84 80 % |
160 80 % |
| Died |
3 20 % |
16 20 % |
21 20 % |
40 20 % |
| Total |
15 100 % |
80 100 % |
105 100 % |
200 100 % |
χ2=0.000 · df=2 · Cramer’s V=0.000 · Fisher’s p=1.000 |
> Died <- c( 3, 16, 21 )
> Total <- c( 15, 80, 105 )
> prop.test(Died, Total)
3-sample test for equality of proportions without continuity
correction
data: Died out of Total
X-squared = 0, df = 2, p-value = 1
alternative hypothesis: two.sided
sample estimates:
prop 1 prop 2 prop 3
0.2 0.2 0.2
Since the p-value of 1 is greater than 0.05, we fail to reject the null hypothesis that there is no difference between proportion of patients that died across the bradychardia, normal heart rate and elevated heart rate groups. An exact p-value of 1 means that the difference in proportions of patients that died between the 3 heart rate groups is exactly 0. Thus, there were an equal number of patients that died with bradycardia, normal heart rate and elevated heart rate.
Further, when conducting a chi-square test with a 5% level of risk of a type 1 error and 2 degrees of freedom, the critical value is 5.99. Since the chi-squared value of 0 is lower than this critical value, we, again, fail to reject the null hypothesis that there is no difference between the proportion of patients that died having bradychardia, normal heart rate and elevated heart rate.
Vital status by Race
Ho: there is no difference between the proportion of patients that died that were White, Black or otherwise, risk = 0.05 Ha: the proportion of patients that died is different in at least one of the race categories
> pacman::p_load(sjPlot)
> sjt.xtab(ICU$Status, ICU$Race, show.col.prc = TRUE)
| Status | Race | Total | ||
|---|---|---|---|---|
| White | Black | Other | ||
| Lived |
138 78.9 % |
14 93.3 % |
8 80 % |
160 80 % |
| Died |
37 21.1 % |
1 6.7 % |
2 20 % |
40 20 % |
| Total |
175 100 % |
15 100 % |
10 100 % |
200 100 % |
χ2=1.810 · df=2 · Cramer’s V=0.095 · Fisher’s p=0.492 |
> Died <- c( 37, 1, 2 )
> Total <- c( 175, 15, 10 )
> prop.test(Died, Total)
3-sample test for equality of proportions without continuity
correction
data: Died out of Total
X-squared = 1.8095, df = 2, p-value = 0.4046
alternative hypothesis: two.sided
sample estimates:
prop 1 prop 2 prop 3
0.21142857 0.06666667 0.20000000
Since the p-value of 0.4046 is greater than 0.05, we fail to reject the null hypothesis that there is no difference between the proportion of patients that died that were White, Black or otherwise.
Further, when conducting a chi-square test with a 5% level of risk of a type 1 error and 2 degrees of freedom, the critical value is 5.99. Since the chi-squared value of 1.8095 is less than this critical value, we, again, fail to reject the null hypothesis that there is no difference between the proportion of patients that died that were White, Black or otherwise.
When looking at the sample estimates and at the table, we see that there was almost no difference between the proportion of White patients that died and the proportion of ‘Other’ patients that died. However, the proportion of Black patients that died was slightly lower than the White and ‘Other’ groups.
Vital status by Consciousness
Ho: there is no difference between the proportion of patients that died across the Conscious, Deep Stupor and Coma groups, risk = 0.05 Ha: the proportion of patients that died is different in at least one of the Conscious, Deep Stupor and Coma groups
> pacman::p_load(sjPlot)
> sjt.xtab(ICU$Status, ICU$Consciousness, show.col.prc = TRUE)
| Status | Consciousness | Total | ||
|---|---|---|---|---|
| Conscious | Deep Stupor | Coma | ||
| Lived |
158 85.4 % |
0 0 % |
2 20 % |
160 80 % |
| Died |
27 14.6 % |
5 100 % |
8 80 % |
40 20 % |
| Total |
185 100 % |
5 100 % |
10 100 % |
200 100 % |
χ2=45.878 · df=2 · Cramer’s V=0.479 · Fisher’s p=0.000 |
> Died <- c( 27, 5, 8 )
> Total <- c( 185, 5, 10 )
> prop.test(Died, Total)
3-sample test for equality of proportions without continuity
correction
data: Died out of Total
X-squared = 45.878, df = 2, p-value = 1.091e-10
alternative hypothesis: two.sided
sample estimates:
prop 1 prop 2 prop 3
0.1459459 1.0000000 0.8000000
Since the p-value of 1.091e-10 is less than 0.05, we will reject the null hypothesis that there is no difference between the proportion of patients that died across the Conscious, Deep Stupor and Coma groups.
Further, when conducting a chi-square test with a 5% level of risk of a type 1 error and 2 degrees of freedom, the critical value is 5.99. Since the chi-squared value of 45.878 is greater than this critical value, we will, again, reject the null hypothesis that there is no difference between the proportion of patients that died across the 3 Consciousness groups.
With a proportion of 1, the Deep Stupor group had the highest proportion of patients that died as all the patients with a deep stupor died. The Coma group has the second highest proportion of patients that died with a sample estimate of 80% dying. The lowest number of deaths was in the group of patients that had no coma or stupor, which is expected. Those with a deep stupor had a higher proportion of deaths than those in a coma.
From the tests of equal proportions, we note that the the following variables had a statistically significant impact on the Vital status of the patient: Service (type of procedure), Type (of admission) and Conciousness.
The following variables were not significant predictors of Vital status of the patient: Cancer, Previous (ICU admission within the last 6 months), Age, Systolic Blood Pressure, Heart Rate and Race.
> par(mfrow=c(1,3))
> boxplot(ICU$Age, main="Boxplot of Age")
> boxplot(ICU$Systolic, main="Boxplot of Systolic Blood Pressure")
> boxplot(ICU$HeartRate, main="Boxplot of Heart Rate")
As we can see above, the outliers are present in systolic blood pressure and heart rate. To see the values of the outliers, please see below, where the first row includes the outliers for systolic blood pressure, and the second includes outliers for heart rate.
> systolic_outlier<-boxplot.stats(ICU$Systolic)
> heartrate_outlier<-boxplot.stats(ICU$HeartRate)
> systolic_outlier$out
[1] 48 212 224 36 256
> heartrate_outlier$out
[1] 192
> sjt.xtab(ICU$Status, ICU$Sex, show.col.prc = TRUE)
| Status | Sex | Total | |
|---|---|---|---|
| Male | Female | ||
| Lived |
100 80.6 % |
60 78.9 % |
160 80 % |
| Died |
24 19.4 % |
16 21.1 % |
40 20 % |
| Total |
124 100 % |
76 100 % |
200 100 % |
χ2=0.012 · df=1 · φ=0.021 · p=0.913 |
> sjt.xtab(ICU$Status, ICU$Race, show.col.prc = TRUE)
| Status | Race | Total | ||
|---|---|---|---|---|
| White | Black | Other | ||
| Lived |
138 78.9 % |
14 93.3 % |
8 80 % |
160 80 % |
| Died |
37 21.1 % |
1 6.7 % |
2 20 % |
40 20 % |
| Total |
175 100 % |
15 100 % |
10 100 % |
200 100 % |
χ2=1.810 · df=2 · Cramer’s V=0.095 · Fisher’s p=0.496 |
> sjt.xtab(ICU$Status, ICU$Service, show.col.prc = TRUE)
| Status | Service | Total | |
|---|---|---|---|
| Medical | Surgical | ||
| Lived |
67 72 % |
93 86.9 % |
160 80 % |
| Died |
26 28 % |
14 13.1 % |
40 20 % |
| Total |
93 100 % |
107 100 % |
200 100 % |
χ2=5.981 · df=1 · φ=0.185 · p=0.014 |
> sjt.xtab(ICU$Status, ICU$Cancer, show.col.prc = TRUE)
| Status | Cancer | Total | |
|---|---|---|---|
| No | Yes | ||
| Lived |
144 80 % |
16 80 % |
160 80 % |
| Died |
36 20 % |
4 20 % |
40 20 % |
| Total |
180 100 % |
20 100 % |
200 100 % |
χ2=0.000 · df=1 · φ=0.000 · Fisher’s p=1.000 |
> sjt.xtab(ICU$Status, ICU$Renal, show.col.prc = TRUE)
| Status | Renal | Total | |
|---|---|---|---|
| No | Yes | ||
| Lived |
149 82.3 % |
11 57.9 % |
160 80 % |
| Died |
32 17.7 % |
8 42.1 % |
40 20 % |
| Total |
181 100 % |
19 100 % |
200 100 % |
χ2=4.976 · df=1 · φ=0.179 · Fisher’s p=0.029 |
> sjt.xtab(ICU$Status, ICU$Infection, show.col.prc = TRUE)
| Status | Infection | Total | |
|---|---|---|---|
| No | Yes | ||
| Lived |
100 86.2 % |
60 71.4 % |
160 80 % |
| Died |
16 13.8 % |
24 28.6 % |
40 20 % |
| Total |
116 100 % |
84 100 % |
200 100 % |
χ2=5.759 · df=1 · φ=0.182 · p=0.016 |
> sjt.xtab(ICU$Status, ICU$CPR, show.col.prc = TRUE)
| Status | CPR | Total | |
|---|---|---|---|
| No | Yes | ||
| Lived |
154 82.4 % |
6 46.2 % |
160 80 % |
| Died |
33 17.6 % |
7 53.8 % |
40 20 % |
| Total |
187 100 % |
13 100 % |
200 100 % |
χ2=7.821 · df=1 · φ=0.223 · Fisher’s p=0.005 |
> sjt.xtab(ICU$Status, ICU$Previous, show.col.prc = TRUE)
| Status | Previous | Total | |
|---|---|---|---|
| No | Yes | ||
| Lived |
137 80.6 % |
23 76.7 % |
160 80 % |
| Died |
33 19.4 % |
7 23.3 % |
40 20 % |
| Total |
170 100 % |
30 100 % |
200 100 % |
χ2=0.061 · df=1 · φ=0.035 · Fisher’s p=0.624 |
> sjt.xtab(ICU$Status, ICU$Type, show.col.prc = TRUE)
| Status | Type | Total | |
|---|---|---|---|
| Elective | Emergency | ||
| Lived |
51 96.2 % |
109 74.1 % |
160 80 % |
| Died |
2 3.8 % |
38 25.9 % |
40 20 % |
| Total |
53 100 % |
147 100 % |
200 100 % |
χ2=10.527 · df=1 · φ=0.244 · p=0.001 |
> sjt.xtab(ICU$Status, ICU$Fracture, show.col.prc = TRUE)
| Status | Fracture | Total | |
|---|---|---|---|
| No | Yes | ||
| Lived |
148 80 % |
12 80 % |
160 80 % |
| Died |
37 20 % |
3 20 % |
40 20 % |
| Total |
185 100 % |
15 100 % |
200 100 % |
χ2=0.000 · df=1 · φ=0.000 · Fisher’s p=1.000 |
> sjt.xtab(ICU$Status, ICU$PO2, show.col.prc = TRUE)
| Status | PO2 | Total | |
|---|---|---|---|
| No | Yes | ||
| Lived |
149 81 % |
11 68.8 % |
160 80 % |
| Died |
35 19 % |
5 31.2 % |
40 20 % |
| Total |
184 100 % |
16 100 % |
200 100 % |
χ2=0.718 · df=1 · φ=0.083 · Fisher’s p=0.324 |
> sjt.xtab(ICU$Status, ICU$PH, show.col.prc = TRUE)
| Status | PH | Total | |
|---|---|---|---|
| No | Yes | ||
| Lived |
151 80.7 % |
9 69.2 % |
160 80 % |
| Died |
36 19.3 % |
4 30.8 % |
40 20 % |
| Total |
187 100 % |
13 100 % |
200 100 % |
χ2=0.416 · df=1 · φ=0.071 · Fisher’s p=0.297 |
> sjt.xtab(ICU$Status, ICU$PCO2, show.col.prc = TRUE)
| Status | PCO2 | Total | |
|---|---|---|---|
| No | Yes | ||
| Lived |
144 80 % |
16 80 % |
160 80 % |
| Died |
36 20 % |
4 20 % |
40 20 % |
| Total |
180 100 % |
20 100 % |
200 100 % |
χ2=0.000 · df=1 · φ=0.000 · Fisher’s p=1.000 |
> sjt.xtab(ICU$Status, ICU$Bicarbonate, show.col.prc = TRUE)
| Status | Bicarbonate | Total | |
|---|---|---|---|
| No | Yes | ||
| Lived |
150 81.1 % |
10 66.7 % |
160 80 % |
| Died |
35 18.9 % |
5 33.3 % |
40 20 % |
| Total |
185 100 % |
15 100 % |
200 100 % |
χ2=1.014 · df=1 · φ=0.095 · Fisher’s p=0.187 |
> sjt.xtab(ICU$Status, ICU$Creatinine, show.col.prc = TRUE)
| Status | Creatinine | Total | |
|---|---|---|---|
| No | Yes | ||
| Lived |
155 81.6 % |
5 50 % |
160 80 % |
| Died |
35 18.4 % |
5 50 % |
40 20 % |
| Total |
190 100 % |
10 100 % |
200 100 % |
χ2=4.112 · df=1 · φ=0.172 · Fisher’s p=0.029 |
> sjt.xtab(ICU$Status, ICU$Consciousness, show.col.prc = TRUE)
| Status | Consciousness | Total | ||
|---|---|---|---|---|
| Conscious | Deep Stupor | Coma | ||
| Lived |
158 85.4 % |
0 0 % |
2 20 % |
160 80 % |
| Died |
27 14.6 % |
5 100 % |
8 80 % |
40 20 % |
| Total |
185 100 % |
5 100 % |
10 100 % |
200 100 % |
χ2=45.878 · df=2 · Cramer’s V=0.479 · Fisher’s p=0.000 |
Statistically significant crosstabs include: Creatinine, Type, CPR, Infection,Renal, Service.
Prior to conducting a logistic regression, we would like to break down the systolic blood pressure into levels. We will use clinically defined levels for hypo, normal, elevated and hyper. Ranges below 90 are considered hypo, whereas ranges from 90 to 120 are considered normal, ranges between 120 and 129 are considered elevated, and 130 plus are considered hyper.
> ICU$Systolic<-cut(ICU$Systolic, breaks=c(0,89,119,129,256))
> levels(ICU$Systolic)<-c("hypo", "normal", "elevated", "hyper")
Below is the logistic regression model with all the predictors in the dataset. As we can see, key predictors that are shown to be statistically significant are age, cancer, systolic pressure, fracture, arterial blood gas concentration, type, and consciousness. We have some confidence in our model due to the small difference between the null and residual deviance. Some of our confidence intervals have zero in them, leading us to have some doubt in our model.
> ICU.m<-glm(formula=Status~Age+Sex+Race+Service+Cancer+Renal+Infection+CPR+Systolic+HeartRate+Previous+Type+Fracture+PO2+PH+PCO2+Bicarbonate+Creatinine+Consciousness, family=binomial(logit), data=ICU)
> summary(ICU.m)
Call:
glm(formula = Status ~ Age + Sex + Race + Service + Cancer +
Renal + Infection + CPR + Systolic + HeartRate + Previous +
Type + Fracture + PO2 + PH + PCO2 + Bicarbonate + Creatinine +
Consciousness, family = binomial(logit), data = ICU)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.45809 -0.52365 -0.17347 -0.00019 2.92298
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.913e+00 2.250e+00 -2.627 0.00861 **
Age 5.124e-02 1.829e-02 2.802 0.00508 **
SexFemale -6.228e-01 5.563e-01 -1.119 0.26295
RaceBlack -1.628e+01 1.320e+03 -0.012 0.99015
RaceOther 4.335e-01 1.312e+00 0.330 0.74115
ServiceSurgical -5.792e-01 6.411e-01 -0.904 0.36625
CancerYes 3.157e+00 1.141e+00 2.766 0.00567 **
RenalYes 1.605e-01 8.434e-01 0.190 0.84906
InfectionYes -4.125e-02 5.657e-01 -0.073 0.94186
CPRYes 1.347e+00 9.901e-01 1.360 0.17372
Systolicnormal -2.175e+00 1.044e+00 -2.084 0.03713 *
Systolicelevated -1.612e+00 1.114e+00 -1.447 0.14782
Systolichyper -2.061e+00 1.026e+00 -2.008 0.04465 *
HeartRate -9.748e-04 1.029e-02 -0.095 0.92455
PreviousYes 1.013e+00 6.937e-01 1.460 0.14439
TypeEmergency 3.396e+00 1.330e+00 2.553 0.01068 *
FractureYes 1.566e+00 1.101e+00 1.422 0.15506
PO2Yes -7.044e-01 9.923e-01 -0.710 0.47781
PHYes 1.526e+00 1.229e+00 1.241 0.21463
PCO2Yes -2.447e+00 1.235e+00 -1.981 0.04756 *
BicarbonateYes -8.800e-03 9.056e-01 -0.010 0.99225
CreatinineYes -2.035e-01 1.139e+00 -0.179 0.85827
ConsciousnessDeep Stupor 3.538e+01 2.568e+03 0.014 0.98901
ConsciousnessComa 3.373e+00 1.359e+00 2.482 0.01307 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 200.16 on 199 degrees of freedom
Residual deviance: 112.67 on 176 degrees of freedom
AIC: 160.67
Number of Fisher Scoring iterations: 17
> confint(ICU.m)
2.5 % 97.5 %
(Intercept) -10.71446902 -1.69601255
Age 0.01835185 0.09076391
SexFemale -1.76222223 0.44150426
RaceBlack -406.17117888 48.28576524
RaceOther -2.78151725 2.76940774
ServiceSurgical -1.87453755 0.66822713
CancerYes 1.04815774 5.66235244
RenalYes -1.58805858 1.78878733
InfectionYes -1.17641537 1.06336175
CPRYes -0.63018852 3.34518326
Systolicnormal -4.37115591 -0.21283200
Systolicelevated -3.96121708 0.47666682
Systolichyper -4.25385307 -0.16344121
HeartRate -0.02182989 0.01893995
PreviousYes -0.38940841 2.37091562
TypeEmergency 1.20038241 6.70600040
FractureYes -0.77450551 3.72610086
PO2Yes -2.80364875 1.17162344
PHYes -0.83446254 4.05350727
PCO2Yes -5.12715251 -0.23880882
BicarbonateYes -1.88096203 1.74562784
CreatinineYes -2.55264347 1.98603281
ConsciousnessDeep Stupor -229.97748941 NA
ConsciousnessComa 0.93736975 6.35058207
> exp(coef(ICU.m))
(Intercept) Age SexFemale
2.704937e-03 1.052574e+00 5.364495e-01
RaceBlack RaceOther ServiceSurgical
8.463168e-08 1.542613e+00 5.603338e-01
CancerYes RenalYes InfectionYes
2.350715e+01 1.174101e+00 9.595848e-01
CPRYes Systolicnormal Systolicelevated
3.845560e+00 1.135578e-01 1.995610e-01
Systolichyper HeartRate PreviousYes
1.273357e-01 9.990257e-01 2.752703e+00
TypeEmergency FractureYes PO2Yes
2.983051e+01 4.788341e+00 4.944214e-01
PHYes PCO2Yes BicarbonateYes
4.598443e+00 8.656750e-02 9.912381e-01
CreatinineYes ConsciousnessDeep Stupor ConsciousnessComa
8.159074e-01 2.322342e+15 2.915368e+01
> ICU.final.m<-glm(formula=Status~Age+Cancer+Systolic+PCO2+Consciousness+Type, family=binomial(logit),data=ICU)
> summary(ICU.final.m)
Call:
glm(formula = Status ~ Age + Cancer + Systolic + PCO2 + Consciousness +
Type, family = binomial(logit), data = ICU)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9242 -0.5616 -0.3078 -0.1039 2.5967
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.46992 1.78152 -3.070 0.00214 **
Age 0.03715 0.01318 2.819 0.00481 **
CancerYes 2.55498 1.06544 2.398 0.01648 *
Systolicnormal -2.17702 0.87730 -2.481 0.01308 *
Systolicelevated -1.71982 0.94888 -1.812 0.06991 .
Systolichyper -2.02455 0.80102 -2.527 0.01149 *
PCO2Yes -1.46173 0.86529 -1.689 0.09116 .
ConsciousnessDeep Stupor 20.75726 1561.13214 0.013 0.98939
ConsciousnessComa 2.76952 0.97924 2.828 0.00468 **
TypeEmergency 3.65070 1.28917 2.832 0.00463 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 200.16 on 199 degrees of freedom
Residual deviance: 125.60 on 190 degrees of freedom
AIC: 145.6
Number of Fisher Scoring iterations: 16
> confint(ICU.final.m)
2.5 % 97.5 %
(Intercept) -9.46575467 -2.26757094
Age 0.01291117 0.06506014
CancerYes 0.59112234 4.95935713
Systolicnormal -4.01598420 -0.51648433
Systolicelevated -3.71304718 0.07016753
Systolichyper -3.71337117 -0.50242069
PCO2Yes -3.36824530 0.08432436
ConsciousnessDeep Stupor -140.94294699 NA
ConsciousnessComa 0.99821470 4.98622776
TypeEmergency 1.60812369 6.92583339
> exp(coef(ICU.final.m))
(Intercept) Age CancerYes
4.211568e-03 1.037849e+00 1.287104e+01
Systolicnormal Systolicelevated Systolichyper
1.133791e-01 1.790977e-01 1.320539e-01
PCO2Yes ConsciousnessDeep Stupor ConsciousnessComa
2.318346e-01 1.034580e+09 1.595094e+01
TypeEmergency
3.850154e+01
> pacman::p_load(rsample)
> set.seed(78)
> train_test_split <- initial_split(ICU)
> train <- training(train_test_split)
> test <- testing(train_test_split)
> (samp <- dim(train_test_split))
analysis assessment n p
150 50 200 24
> ICU.tr<-glm(Status~Age+Cancer+Systolic+PCO2+Consciousness+Type, family=binomial(logit),data=train)
> summary(ICU.tr)
Call:
glm(formula = Status ~ Age + Cancer + Systolic + PCO2 + Consciousness +
Type, family = binomial(logit), data = train)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.67290 -0.48913 -0.19920 -0.00002 2.22166
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -21.10717 2282.51458 -0.009 0.99262
Age 0.01866 0.01534 1.217 0.22364
CancerYes 4.24030 1.76849 2.398 0.01650 *
Systolicnormal -3.07973 1.10903 -2.777 0.00549 **
Systolicelevated -4.27230 1.83340 -2.330 0.01979 *
Systolichyper -2.54336 0.99130 -2.566 0.01030 *
PCO2Yes -2.54676 1.41142 -1.804 0.07117 .
ConsciousnessDeep Stupor 40.78134 6527.94176 0.006 0.99502
ConsciousnessComa 4.33342 1.78235 2.431 0.01504 *
TypeEmergency 20.91660 2282.51417 0.009 0.99269
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 144.406 on 149 degrees of freedom
Residual deviance: 72.666 on 140 degrees of freedom
AIC: 92.666
Number of Fisher Scoring iterations: 19
> predicted.val<-predict(ICU.tr, newdata=test)
> head(predicted.val)
2 7 8 11 15 17
-2.169124 -23.011067 -2.174011 -22.624007 -1.502103 -22.624007
> predicted.prob <- predict(ICU.tr, test, type = "response")
> head( predict( ICU.tr, test, type="response") )
2 7 8 11 15
1.025576e-01 1.014894e-10 1.021087e-01 1.494578e-10 1.821121e-01
17
1.494578e-10
> predicted.classes <- ifelse( predicted.prob > 0.5, "Lived", "Dead" )
> head(predicted.classes)
2 7 8 11 15 17
"Dead" "Dead" "Dead" "Dead" "Dead" "Dead"
> pacman::p_load(car)
> car::vif(ICU.tr)
GVIF Df GVIF^(1/(2*Df))
Age 1.073842 1 1.036264
Cancer 1.527300 1 1.235840
Systolic 1.419763 3 1.060155
PCO2 1.980314 1 1.407236
Consciousness 1.931509 2 1.178892
Type 1.139286 1 1.067374
> fit0<-glm(Status~1, family=binomial(logit), data=train)
> summary(fit0)
Call:
glm(formula = Status ~ 1, family = binomial(logit), data = train)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.6428 -0.6428 -0.6428 -0.6428 1.8322
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.4718 0.2095 -7.024 2.16e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 144.41 on 149 degrees of freedom
Residual deviance: 144.41 on 149 degrees of freedom
AIC: 146.41
Number of Fisher Scoring iterations: 4
> fitall<-glm(Status~Age+Cancer+Systolic+PCO2+Consciousness+Type, family=binomial(logit), data=train)
> summary(fitall)
Call:
glm(formula = Status ~ Age + Cancer + Systolic + PCO2 + Consciousness +
Type, family = binomial(logit), data = train)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.67290 -0.48913 -0.19920 -0.00002 2.22166
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -21.10717 2282.51458 -0.009 0.99262
Age 0.01866 0.01534 1.217 0.22364
CancerYes 4.24030 1.76849 2.398 0.01650 *
Systolicnormal -3.07973 1.10903 -2.777 0.00549 **
Systolicelevated -4.27230 1.83340 -2.330 0.01979 *
Systolichyper -2.54336 0.99130 -2.566 0.01030 *
PCO2Yes -2.54676 1.41142 -1.804 0.07117 .
ConsciousnessDeep Stupor 40.78134 6527.94176 0.006 0.99502
ConsciousnessComa 4.33342 1.78235 2.431 0.01504 *
TypeEmergency 20.91660 2282.51417 0.009 0.99269
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 144.406 on 149 degrees of freedom
Residual deviance: 72.666 on 140 degrees of freedom
AIC: 92.666
Number of Fisher Scoring iterations: 19
> anova(fitall, fit0, test="Chisq")
Analysis of Deviance Table
Model 1: Status ~ Age + Cancer + Systolic + PCO2 + Consciousness + Type
Model 2: Status ~ 1
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 140 72.666
2 149 144.406 -9 -71.74 6.934e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> pacman::p_load(survey)
> regTermTest(ICU.tr, "Age")
Wald test for Age
in glm(formula = Status ~ Age + Cancer + Systolic + PCO2 + Consciousness +
Type, family = binomial(logit), data = train)
F = 1.480864 on 1 and 140 df: p= 0.22569
> regTermTest(ICU.tr, "Cancer")
Wald test for Cancer
in glm(formula = Status ~ Age + Cancer + Systolic + PCO2 + Consciousness +
Type, family = binomial(logit), data = train)
F = 5.748968 on 1 and 140 df: p= 0.017817
> regTermTest(ICU.tr, "Systolic")
Wald test for Systolic
in glm(formula = Status ~ Age + Cancer + Systolic + PCO2 + Consciousness +
Type, family = binomial(logit), data = train)
F = 3.105522 on 3 and 140 df: p= 0.028612
> regTermTest(ICU.tr, "Type")
Wald test for Type
in glm(formula = Status ~ Age + Cancer + Systolic + PCO2 + Consciousness +
Type, family = binomial(logit), data = train)
F = 8.3976e-05 on 1 and 140 df: p= 0.9927
> regTermTest(ICU.tr, "PCO2")
Wald test for PCO2
in glm(formula = Status ~ Age + Cancer + Systolic + PCO2 + Consciousness +
Type, family = binomial(logit), data = train)
F = 3.255852 on 1 and 140 df: p= 0.073319
> regTermTest(ICU.tr, "Consciousness")
Wald test for Consciousness
in glm(formula = Status ~ Age + Cancer + Systolic + PCO2 + Consciousness +
Type, family = binomial(logit), data = train)
F = 2.955636 on 2 and 140 df: p= 0.055302
> pacman::p_load(rpart, rpart.plot, rattle, dplyr)
> set.seed(2715)
>
> #final model
> icutree=rpart(Status~Age+Cancer+Systolic+Type+PCO2+Consciousness+Type+Fracture+Sex+Race+Service+Renal+Infection+CPR+HeartRate+Previous+PO2+PH+Bicarbonate+Creatinine+Consciousness, method="class", data=ICU)
> #rpart.plot(icutree,
> #extra = 104, # show fitted class, probs, percentages
> #box.palette = "GnYlRd", # color scheme
> #prefix = "Status\n",
> #branch.lty = 1, # solid branch lines, 3 = dotted
> #shadow.col = "gray", # shadows under the node boxes
> #split.prefix = "is ", # put "is " before split text
> #split.suffix = "?", # put "?" after split text
> #nn = TRUE, # display the node numbers
> #tweak = 1.2)
> fancyRpartPlot(icutree)
> printcp(icutree)
Classification tree:
rpart(formula = Status ~ Age + Cancer + Systolic + Type + PCO2 +
Consciousness + Type + Fracture + Sex + Race + Service +
Renal + Infection + CPR + HeartRate + Previous + PO2 + PH +
Bicarbonate + Creatinine + Consciousness, data = ICU, method = "class")
Variables actually used in tree construction:
[1] Consciousness Systolic
Root node error: 40/200 = 0.2
n= 200
CP nsplit rel error xerror xstd
1 0.275 0 1.000 1.000 0.14142
2 0.075 1 0.725 0.725 0.12449
3 0.010 2 0.650 0.725 0.12449
> predicted.classes <- icutree %>% predict(, data=icu, type = "class")
> head(predicted.classes, 12)
1 2 3 4 5 6 7 8 9 10 11 12
Lived Lived Lived Lived Lived Lived Lived Lived Lived Lived Lived Lived
Levels: Lived Died
> mean(predicted.classes == ICU$Status)
[1] 0.87
> plotcp(icutree)
A work by Ekene Olatunji, Caterine Nassralla, Victoria Chin, Bassam Chamas, and Sajiya Somji
Msc. eHealth
DeGroote School of Business
eHealth 705 Statistics for eHealth - Final Project